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jH ' Abstract. It has recently been shown that the NP-hard problem of calcu- 

lating the minimum number of hybridization events that is needed to explain a 
set of rooted binary phylogenetic trees by means of a hybridization network is 
fixed-parameter tractable if an instance of the problem consists of precisely two 
Oi ' such trees. In this paper, we show that this problem remains fixed-parameter 

tractable for an arbitrarily large set of rooted binary phylogenetic trees. In 
particular, we present a quadratic kernel. 
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1. Introduction 



Phylogenetic trees are a commonly used tool for representing evolutionary re- 
lationships. Let X be a finite set representing for example biological species or, 
more generally, taxa. A rooted phylogenetic X-tree is a rooted tree that has no 
r^ , vertices of outdegree 1 and whose leaves are bijectively labeled by the elements 

\^ ' oi X. Recently, rooted phylogenetic networks have become increasingly important 

^D . in analyzing evolutionary histories of sets of taxa whose past may include reticulate 

\l^ ' evolutionary events such as horizontal gene transfer, hybridization, or recombina- 

CO . tion. Rooted phylogenetic networks are a generalization of rooted phylogenetic 

^D • X-trees to directed acyclic graphs. In particular, vertices of indegree at least two 

arc called reticulation vertices and represent events in which, in the context of hy- 
bridization, two distinct ancestral species combine their genomes and form a new 
species. The number of reticulations specified by a reticulation vertex is defined as 
its indegree minus one while the number of reticulations specified by a phylogenetic 
""jjj , network TV is defined as the sum of the number of reticulations over all reticulation 

^ _• vertices in iV. To quantify the extent to which hybridization events have had an 

impact on the evolutionary history of a set of present-day species, the following 
optimization problem has attracted much interest. Let T be a set of rooted phylo- 
genetic trees on the same set of taxa. What is the minimum number of reticulations 
specified by any phylogenetic network that explains each of the trees in 7"? The 
decision variant of this problem, called Hybridization Number, as well as precise 
definitions are stated in Section[2l Since most of the research that is concerned with 
this question has been done in the context of hybridization, we henceforth refer to 
a phylogenetic network as a hybridization network and to reticulations specified by 
a network as hybridizations. 
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Since Hybridization Number is APX-hard and, thus, NP-liard even for sets of 
rooted phyfogenetic trees consisting of precisely two binary such trees [6] , many the- 
oretical results as well as practical algorithms have been developed for this restric- 
ted case. In particular, it has been shown that the two-tree case is fixed-parameter 
tractable (FPT), regardless of whether the two rooted phylogenetic trees are binary 
or not [5l[l5]. Roughly speaking, to establish these results, the authors used several 
reduction rules that shrink each problem instance to a reduced (weighted) instance 
whose size is linear in the value of an optimal solution. Subsequent to these results, 
practical algorithms have been developed that solve Hybridization Number for 
two rooted binary phylogenetic trees [1] [71 [9l [161 [H] • Instead of calculating an op- 
timal hybridization network directly, all these algorithms make use of the concept 
of so-called agreement forests. Without going into details, an agreement-forest for 
two rooted binary phylogenetic trees T and T' on the same set of taxa is a collec- 
tion of disjoint subtrees that are common to T and T' . If such a collection is, in a 
certain sense, acyclic and of minimum size, then its number of elements minus one 
equates to the solution of Hybridization Number for T = {T, T'} [2]. However, 
this framework of agreement forests can only be applied to more than two phylo- 
genetic trees if one is solely interested in the minimum number of hybridization 
vertices, but not the actual minimum number of hybridizations specified by any 
hybridization network that explains the set of trees under consideration. These two 
numbers are equal in the two-tree case since each hybridization vertex has exactly 
two parents [14] . Given this difficulty and the computational hardness of Hybrid- 
ization Number, it does not come as a surprise that, prior to this paper, there 
were no exact algorithms that can solve Hybridization Number for more than 
two trees. The only available algorithms are described in [51 [T7] and, in fact, are 
heuristics that compute lower and upper bounds for a given instance. 

In this paper, we show that Hybridization Number remains fixed-parameter 
tractable if the input to this problem consists of arbitrarily many rooted binary 
phylogenetic trees on the same set of taxa. This generalization is of significant rel- 
evance for applications in (for example) evolutionary biology since biologists usually 
construct phylogenetic trees for more than two different genes and are interested 
in the number of hybridizations necessary to explain all reconstructed gene trees 
simultaneously. Our result shows that, as in the two-tree case, this problem can 
be solved by using an FPT-algorithm. We hope that this result will facilitate the 
development of practical algorithms in the same way as it has been the case for the 
restricted two-tree version of the problem. 

The paper is organized as follows. The next section contains some notation and 
terminology that is used throughout this paper and formally states the decision 
problem Hybridization Number. Section [3l estabhshes the main result of this 
paper; thus showing that Hybridization Number is fixed-parameter tractable by 
providing a quadratic kernel. We end this paper with some concluding remarks in 
Section [H 
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2. Preliminaries 

This section provides preliminary definitions that are used throughout this paper 
and formaUy states the decision problem HYBRIDIZATION Number for a set of 
rooted binary phylogenetic trees. Let X be a finite set. We refer to the elements 
of X as taxa. 

Phylogenetic trees. A rooted binary phylogenetic X-tree T is a rooted tree 
whose root has degree two while all other interior vertices have degree three, and 
whose leaves are bijectively labeled by the elements of X. We identify each leaf 
with its label and thus refer to X as the leaf set of T. We regard the edges of T as 
being directed away from the root. 

Hybridization networks. A hybridization network A^ on X is a rooted acyclic 
digraph which has a single root of indcgrcc and outdcgrce at least 2, has no vertex 
with indegree and outdcgrce both 1, and in which the vertices of outdcgrce are 
bijectively labeled with the elements oi X. A vertex whose indcgrcc is at least 2 is 
called a hybridization vertex. A hybridization network is binary if all vertices have 
indegree and outdcgrce at most 2 and each hybridization vertex has outdcgrce 1. 
Note that a rooted binary phylogenetic X-tree is a binary hybridization network 
on X with no hybridization vertices. 

Let N he a hybridization network on X. Furthermore, let X' be a subset of X, 
and let T' be a rooted phylogenetic AT'-tree. Then T' is said to be a pendant 
subtree of A^ if it is a subtree that can be detached from N by deleting a single 
edge. Furthermore, if {u, v) is an edge of A^, we say that m is a parent of v and v is 
a child of u. Note that these definitions hold in particular for rooted phylogenetic 
trees. 

To quantify the number of hybridizations in a hybridization network A^, the 
hybridization number of A^ is given by 

hiN)=Y,id-{v)^l), 

where d~ (v) is the indcgrcc of v and p is the root of A^. 

Let A^ again be a hybridization network on X, and let T be a rooted binary 
phylogenetic X'-tree, with X' C X. We say that T is displayed by A' if T can be 
obtained from A^ by deleting a subset of the edges and vertices of A^ and suppressing 
vertices with indegree and outdcgrce both 1. In other words, A^ displays T if there 
exist a subgraph of A^ that is a subdivision of T. Intuitively, if A^ displays T, then 
all of the ancestral relationships of T are visualized by A^. Furthermore, for a set 
T of rooted binary phylogenetic A"'-trees, we say that A^ displays T if A^ displays 
each tree in T. 

The problem HYBRIDIZATION Number is to compute the minimum hybridiza- 
tion number of a set T of rooted binary phylogenetic AT-trees, which is defined as 
follows. 

h{T) = min{/i(A^) : A^ is a hybridization network that displays T}- 
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This problem can formally be stated as the following decision problem. 

Problem: Hybridization Number 

Instance: A set T of rooted binary phylogcnctic X-trees and a positive integer k. 

Question: Is h{T) < kl 

In the remainder of this paper, we will exclusively focus our attention on binary hy- 
bridization networks. To see that this is sufficient, we need the following lemma |10[ 
Lemma 3]. 

Lemma 1. Let N he a hybridization network on X that displays a set of rooted 
binary phylogenetic X -trees. Then there exists a binary hybridization network N' 
on X that displays T such that h{N') = h{N). 

Let (T, k) be an instance of Hybridization Number. We will show that two 
reduction rules described below transform (7~, k) into an equivalent instance (7"', k) 
with a quadratic number of taxa. More precisely, T' is a collection of rooted binary 
phylogenetic X'-trees such that h(X') < fc if and only if h{T) < k and \X'\ < 20fc2. 

To describe the reduction rules, we need some additional definitions. Let 7" be a 
set of rooted binary phylogenetic X-trees and let X' C X. A rooted phylogenetic 
X'-tree is a common pendant subtree of T if it is a pendant subtree of each element 
in T- Now, let T e 7" and let {xi,X2, ■ ■ ■ ,Xn) be a tuple of elements of X with 
n > 2, and let pi be the parent of the leaf labeled Xi in T, for each i G {1, 2, . . . , n}. 
Then, (xi, a;2, . . . , a;„) is called a chain of T if either (p„,p„_i, . . . ,pi) is a directed 
path in T, or {pn,Pn-i, ■ ■ ■ ,P2) is a directed path in T and pi = p2- Furthermore, 
{xi,X2, ■ ■ ■ ,Xn) is a common chain of T if it is a chain of each element in T. 

Let (7", k) be an instance of Hybridization Number. We are now in a position 
to state two reduction rules. 

Subtree Reduction. For a common pendant subtree T of 7" with at least two 
leaves, replace, in each element of T, the pendant subtree T by a single leaf labeled 
by a new taxon (that is not yet in X). 

Chain Reduction. For a common chain (xi, a;2, . . . , Xn) of T with n > 5k, delete, 
in each element of 7", the leaves labeled with a member of {a;5fc+i,X5fc+2, . . . ,Xn} 
and suppress all vertices with indegree and outdegree both 1. 

We remark that similar reductions have been published in the context of calculating 
the minimum hybridization number as well as the so-called subtree prune and 
regraft distance for two phylogenies and proven to be important to develop 'efficient' 
algorithms despite the NP-hardness of the underlying problems [H IH [5] . 

To obtain a proof of the kernelization for more than two trees, we need the 
following notion of generators. A binary k -reticulation generator (with k G N"*") is 
an acyclic directed multigraph with a single root with indegree and outdegree 1 
and all other vertices have indegree 1 and outdegree 2, indegree 2 and outdegree 1, 
or indegree 2 and outdegree 0. Let TV be a binary hybridization network, with 
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h(N) = k, that has no pendant subtrees with two or more leaves. Then, a binary k- 
reticulation generator is said to be the generator underlying N if it can be obtained 
from N in the following way. First, delete all leaves of N and suppress each resulting 
vertex with indcgrce and outdegree both 1. Second, if the root has outdegree 2, 
add a new root with an edge to the old root. For a formal proof showing that the 
resulting directed multigraph is indeed a binary fc-reticulation generator, we refer 
the reader to |12| Lemma 4] . Reversely, N can be reconstructed from its underlying 
generator by subdividing edges, adjoining a leaf to each vertex that subdivides an 
edge, or has indegree 2 and outdegree via a new edge, and deleting the outdegree- 
1 root. The sides of a generator are its edges (the edge sides) and its vertices 
with indegree 2 and outdegree (the vertex sides). Thus, each leaf of A'^ is on a 
certain side of its underlying generator. To be more formal, let a; be a leaf of N 
and let p be the parent oi x. If p is a hybridization vertex, then p is a vertex side 
of the underlying generator and we say that x is on side p. If, on the other hand, 
p has indegree 1 and outdegree 2, then p is used to subdivide an edge side e of the 
underlying generator (because N has no pendant subtrees with two or more leaves) 
and we say that x is on side e. 

Let 7" be a set of rooted binary phylogenetic X-trees with no common pendant 
subtrees with two or more leaves, and let TV be a binary hybridization network on X 
that displays T. Then, clearly, N has no pendant subtrees with two or more leaves. 
Let G be the generator underlying N. A common chain C = (xi, X2, ■ ■ ■ , Xn) of T 
is said to survive in A'' if all elements of {xi, X2, ■ ■ ■ , Xn} are on the same edge side 
of G, and C is said to be atomized in N if no two elements of {xi, X2, . . . , x„} are 
on the same side of G. 

Kernels and fixed-parameter tractability. A kernelization of a parameterized 
problem is a polynomial-time algorithm that maps an instance x with parameter k 
to an instance x' with parameter k' such that (1) {x'^k') is a yes-instance if and 
only if (x, k) is a yes-instance, (2) the size of x' is bounded by a function / of k, 
and (3) the size of k' is bounded by a function of k. A kernelization is usually 
referred to as a kernel and the function / as the size of the kernel. Thus, a 
parameterized problem admits a quadratic kernel if there exists a kernelization 
with / being a quadratic function. A parameterized problem is fixed-parameter 
tractable if there exists an algorithm that solves the problem in time 0{g{k)\x\^^'), 
with g being some function of k and |a;| the size of x. Such an algorithm is called an 
FPT-algorithm. It is well known that a parameterized problem is fixed-parameter 
tractable if and only if it admits a kernelization and is decidable. However, not 
for every fixed-parameter tractable problem a kernel of polynomial size is known. 
Kernels are of particular interest because they can be used as a polynomial-time 
preprocessing which can be combined with any algorithm solving the problem. 

3. Fixed-parameter tractability of Hybridization Number 

In this section, we establish the following theorem which is the main result of 
this paper. 
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Theorem 1. Let T be a set of rooted binary phylogenetic X-trees, let T' be the 
set of rooted binary phylogenetic X' -trees obtained from T by applying the subtree 
reduction as often as possible and subsequently the chain reduction as often as pos- 
sible, and let k £ N+ . Then, h{V) < k if and only if h(T) < k and \X'\ < 20fc2. 
In particular, HYBRIDIZATION Number, parameterized by k, is fixed-parameter 
tractable. 

To establish Theorem[l] we need several lemmas. Wc start by showing that the sub- 
tree reduction does not affect the solution of any instance (T, k) of Hybridization 
Number. 

Lemma 2. Let T be a set of rooted binary phylogenetic X-trees and k £ N~^ . 
Furthermore, let T" be the set of trees that results from a single application of the 
subtree reduction to T . Then h{T) < k if and only if h(T"^) < k. 

Proof. First assume that h{'T) < k. Then there exists a hybridization network N 
that displays T such that h{N) < k. Without loss of generality, choose N such 
that h{N) is minimized over all hybridization networks that display T. Consider 
a common pendant subtree S oi T that was reduced under an application of the 
subtree reduction. Then, S is also a pendant subtree in TV because otherwise there 
would exist a hybridization network that displays 7" and has a smaller hybridization 
number than N. Now, by obtaining a network N' from N by replacing S with a 
new vertex labeled s, it is easily checked that N' is a hybridization network that 
displays T"'*. By reversing the argument, it follows that h{T) < fc if and only if 
hiT") < fc. D 

As a result of this lemma, we can assume throughout the remainder of this paper 
that the set of input trees T to Hybridization Number has no common pendant 
subtree. To establish a similar result for the chain reduction (Lemma [S]), we need 
two additional lemmas and some definitions. 

For a rooted phylogenetic X-trcc T and a subset X' of X, we define T\X' to 
be the rooted phylogenetic X'-txee obtained from T by taking the minimal subtree 
of T containing all leaves in X' and suppressing all vertices with indegree 1 and 
outdegree 1. Given two vertices u and w of a hybridization network, we say that u 
is an ancestor of v if there is a directed path from u to v. Furthermore, a vertex 
of a directed path P is called internal if it is not the first or the last vertex of P. 
Lastly, two directed paths Pi and P2 are called internally vertex- disjoint if there is 
no vertex of Pi and P2 that is an internal vertex of Pi and P2 ■ 

Lemma 3. Let T be a set of rooted binary phylogenetic X-trees with no common 
pendant subtrees with at least two leaves. Then there exists a binary hybridization 
network N on X with h{N) = h{T) that displays T such that each common chain 
of T either survives or is atomized in N. 

Proof. Let iVo be a binary hybridization network that displays T such that h^No) = 
h{T). Note that such a network exists by Lemma[TJ We will construct a network N 
from A^o that satisfies the statement of the lemma by considering each common 
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chain c of T that neither survives nor is atomized in Nq and making changes to the 
network so that c survives in N. 

Let c = (a;i,a;2, . . . ,x„) be a common chain of T that neither survives nor is 
atomized in Nq. Note that n > 3, since any chain of two taxa that does not survive 
is, by definition, atomized, and that A''o has no pendant subtrees of at least two 
leaves since T has no common pendant subtrees of at least two leaves. 

Let Go be the generator underlying A'o and, for convenience, let C = {xi, a;2, . . . , x„}. 
Since c is not atomized, there exist taxa x and x' in C that are on the same side s 
of Gq. Note that s can only be an edge side. Let p and p' be the parents of x 
and x' , respectively, and assume without loss of generality that p is an ancestor 
oi p' . Let e ~ ig,p) be the unique edge entering p. Hence, e is an edge of the path 
in Nq corresponding to side s. 

We move all taxa of G to side s; thereby creating a network in which c survives. 
More precisely, we construct networks A^i and N2 from A^o as follows. First, delete 
all taxa of G and clean up the resulting network by repeatedly deleting unlabeled 
outdegree-0 vertices and suppressing vertices with indegree 1 and outdegree 1 until 
none of these operations is possible (and one has thus obtained a valid hybridiz- 
ation network). Call this intermediate network A^i. We remark that we delete 
unlabeled outdegree-0 vertices because these arise whenever a leaf is deleted that 
is on a vertex side of Gq. However, by moving such a leaf to an edge side, we 
reduce the hybridization number of the resulting network which would lead to a 
contradiction at the end of the proof. Thus, no taxon of G is on a vertex side of 
Gq. Now, let e' be the edge of A^i corresponding to edge e of A^o- Subdivide e' 
by n vertices pi,p2, ■ ■ ■ ,Pn, creating a directed path pn,Pn-i, • • ■ ,Pi, and introduce 
a leaf labeled Xi and an edge {pi,Xi) for each i e {1,2, ...,n}. Call the obtained 
network A^2- 

It remains to show that A^2 displays T. Consider any tree T e T- Since Nq 
displays T, there exists a subtree Tg of A'o that is a subdivision of T. Since c is a 
chain of T, Tq contains a subdivision of a caterpillar on G. In other words, there 
exist a directed path B in Nq and directed paths i„, L„_i, . . . , ii in A^o that start 
on B (in that order) and lead to Xn,Xn-i, ■ ■ ■ ,xi respectively, such that the dir- 
ected paths B, Li, L2, ■ ■ ■ , Ln are pairwise internally vertex-disjoint. Moreover, B 
is chosen such that the first vertex Vc of B is the first vertex of L„ and the last 
vertex of B is the first vertex of Li and the first vertex of £2- We next argue 
that p is a vertex of B. Since x and x' are on the same side of Go (and p is an 
ancestor of p'), there is a unique directed path from p to p'. Hence, any path 
from Tc to p' passes through p. Thus, B passes through p and it follows that p is 
a vertex of B. If edge e = {g,p) is not an edge of B (i.e. if a; = a;„), add g to B. 
Now, recall that A^i was obtained from Nq by deleting and suppressing vertices. 
By deleting or suppressing each vertex in To that has been deleted or suppressed 
in A^'o to obtain A'^i, we obtain a subtree Ti of A^i that contains a subdivision of 
T\{X \ G). Hence, A^i displays T\{X \ C). Moreover, note that e' is an edge of Ti. 
Recall that N2 was obtained from A^i by subdividing e' and hanging leaves labeled 
by elements of G below the vertices subdividing e', and observe that T can be 
obtained from T\{X \ G) by applying the same operations. Therefore, we consider 
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the subtree T2 of N2 obtained by applying the same operations to Ti , and conclude 
that T2 contains a subdivision of T. It follows that 7V2 displays T. Since the above 
arguments hold for all T e T, it follows that N2 displays T- 

By repeating the above construction for each common chain of T that docs not 
survive and is not atomized in A'o, we obtain a network N that displays T such 
that each common chain of T either survives or is atomized. Moreover, the changes 
that turned Nq into N did not increase the reticulation number. Hence, h{N) < 
/i(A^o) = h{T). If h{N) < h{T), we would obtain a contradiction. Therefore, we 
conclude that h{N) = h{T). □ 

The following lemma is implicitly in [121 Theorem 3.2]. We include it here for 
reasons of completeness. 

Lemma 4. Let N be a binary hybridization network with h{N) — k, and let G 
be its underlying generator. Then G has at most 4fc — 1 edge sides and at most k 
vertex sides. In particular, G has at most 5fc — 1 sides. 

Proof. Let no be the number of vertices in G with indegree 2 and outdegree 0, 
let ni be the number of vertices in G with indegree 2 and outdegree 1, and let 712 
be the number of vertices in G with indegree 1 and outdegree 2. Then, the total 
indegree of G is n2+2ni + 2no while, considering the root vertex with indegree and 
outdegree 1, the total outdegree of G is 1 + 2n2 + ni. Hence, by the Handshaking 
Lemma, we have 712 + 2ni + 2no = 1 + 2712 + 7ii and, therefore, 71.2 = 77,1+ 27io — 1. 
Since the number of edge sides of G, denoted \E{G)\, is equal to the total indegree 
of G and noting that tiq + 77i = fc, we have 

\E{G)\ = 7i2 + 2771 + 2770 = 3711 + 477o - 1 < 4fc - 1. 

Furthermore, since each vertex side of G is a vertex with indegree 2, G has at 
most k such sides; thereby establishing the lemma. D 

Lemma 5. Let T be a set of rooted binary phylogenetic X -trees and k G N+. 
Furthermore, let T'^ be the set of trees that results from a single application of the 
chain reduction to T . Then /i(T) < k if and only if h{T'^) < k. 

Proof. Let c = (.ti, X2, . . . , a;„) be a common chain of T which has been reduced 
by a chain reduction to a common chain c' = (a;i, X2 ■ ■ . , x^k) of 7"^. Thus, n > 5fc. 

First, suppose that /i(7~) < k. Then, by Lemma |3l there exists a binary hybrid- 
ization network N , with h{N) < fc, that displays T such that any common chain 
of T either survives or is atomized in N. Furthermore, by Lemma |4l the generator 
underlying N has at most 5fc — 1 sides. Hence, by the pigeonhole principle, c can- 
not be atomized in N and, therefore, survives in N. Now, let N' be the network 
obtained from N by replacing c with c'. More precisely, delete all leaves labeled by 
taxa in {x^k+i, X5k+2, ■ ■ ■ ,Xn} and suppress all resulting vertices of indegree and 
outdegree both 1. Then, as N displays T, it is easily checked that N' displays T'^ 
and h{N') < k. Thus, h{T'') < k. 
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To show the other direction, suppose that h{T'^) < k. Then, by Lemma [3l 
there exists a binary hybridization network TV' with h{N') < k, that displays T'^ 
such that any common chain of T'^ either survives or is atomized in N' . By again 
using the pigeonhole principle, c' cannot be atomized in A^' since it has 5k taxa 
while the generator underlying A^' has at most 5fc — 1 sides. Hence, c' survives 
in N'. Now, let N be the network obtained from N' by replacing c' with c. To 
be precise, let e be the edge entering the parent, say p^k, of the vertex labeled x^k 
in N' . Since c' survives in N' , note that e is unique. Subdivide e by n — 5fc new 
vertices P5k+i,P5k+2, ■ ■ ■ ,Pn, creating a directed path p„,p„_i, . . . ,P5k+i, and add 
a leaf labeled Xi and an edge {pi, xi) for each i £ {5k + 1, 5fc + 2, . . . , n}. Then, as 
N' displays T^, it is easily checked that N displays T and has h{N) < k. Thus, 
h{T) <k. D 

We next show that the subtree and chain reduction can be applied to a collection 
of rooted binary phylogenetic X-trees until the label set of the resulting collection 
of trees has size bounded by a quadratic function of h{T). For the proof, we follow 
an approach similar to the one taken by Kelk et al. [T^ Lemma 3.2]. 

Lemma 6. Let T be a set of rooted binary phylogenetic X-trees, let T' he the set 
of rooted binary phylogenetic X' -trees obtained from T by applying the subtree and 
chain reduction until no further reduction is possible, and let k G N+. If h{T) < k, 
then\X'\<2Qk'^. 

Proof. As h{T) < k, it follows from Lemmas [2] and [5] that h{T') < k. Let iV be a 
binary hybridization network that displays T' such that h{N) < k. Furthermore, 
let G be its underlying binary ft,(A^)-reticulation generator. 

Observe that N has no pendant subtrees of size at least 2, since otherwise T' 
would have a common pendant subtree; thereby contradicting that the subtree 
reduction has been applied as often as possible. Furthermore, N does not have 
more than 5fc leaves that are on the same side of G, since otherwise T' would 
have a common chain of size greater than 5fc, thereby contradicting that the chain 
reduction has been applied as often as possible. 

Thus, N has one leaf per vertex side of G and at most 5k leaves per edge 
side of G. By Lemma |4] (and because h{N) < k), G has at most 4A: — 1 edge 
sides and at most k vertex sides. Thus, the total number of leaves is at most 
5k ■ (4fc - 1) + fc = 20fc2 - 4fc < 20fc2. It now follows that \X'\ < 20fc2. D 

Now, Theorem [1] follows from Lemmas [21 \5\ and El 

4. Concluding remarks 

While Theorem [H proves the existence of an FPT-algorithm to solve Hybrid- 
ization Number, it does not describe an explicit algorithm to do so. In order to 
obtain such an algorithm, one needs an exponential-time exact algorithm to solve 
an instance of Hybridization Number after it has been kernelized. One possible 



10 LEO VAN lERSEL AND SIMONE LINZ 

way to design an FPT-algorithm for Hybridization Number is the folfowing. 
Theorem 2 of [13] estabhshes an algorithm — called Clustistic — that, given a set 
of rooted binary phylogenetic trees and an integer fc, finds all binary hybridization 
networks that represent all clusters of the trees (in the so-called softwired sense, 
see e.g. |10| ) and have hybridization number at most k. Since any network that 
displays a given set of rooted binary phylogenetic trees also represents all clusters 
of those trees, Clustistic finds it. Thus, an exponential-time exact algorithm 
for Hybridization Number can be obtained by using Clustistic and checking 
for each returned network if it displays the input trees (e.g. using the algorithm 
in |11| . which is exponential in the number of hybridizations of a given hybridiz- 
ation network). In combination with the presented kernelization, this leads to an 
FPT-algorithm for Hybridization Number. We omit the details of this algorithm 
as its theoretical worst-case running time is not necessarily the best and we expect 
that methods arc possible that arc much faster in practice. We also remark that if 
one allows weighted chains, as in [5], then a slightly modified chain reduction can 
be used to obtain a linear kernel for a modified problem, where each common chain 
is associated with a weight. 

A major open problem is to show whether or not it is also fixed-parameter 
tractable to compute the minimum hybridization number of a set of arbitrary rooted 
phylogenetic trees; thus allowing for trees that are nonbinary. 
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